16 research outputs found
Video Super-Resolution Transformer
Video super-resolution (VSR), with the aim to restore a high-resolution video
from its corresponding low-resolution version, is a spatial-temporal sequence
prediction problem. Recently, Transformer has been gaining popularity due to
its parallel computing ability for sequence-to-sequence modeling. Thus, it
seems to be straightforward to apply the vision Transformer to solve VSR.
However, the typical block design of Transformer with a fully connected
self-attention layer and a token-wise feed-forward layer does not fit well for
VSR due to the following two reasons. First, the fully connected self-attention
layer neglects to exploit the data locality because this layer relies on linear
layers to compute attention maps. Second, the token-wise feed-forward layer
lacks the feature alignment which is important for VSR since this layer
independently processes each of the input token embeddings without any
interaction among them. In this paper, we make the first attempt to adapt
Transformer for VSR. Specifically, to tackle the first issue, we present a
spatial-temporal convolutional self-attention layer with a theoretical
understanding to exploit the locality information. For the second issue, we
design a bidirectional optical flow-based feed-forward layer to discover the
correlations across different video frames and also align features. Extensive
experiments on several benchmark datasets demonstrate the effectiveness of our
proposed method. The code will be available at
https://github.com/caojiezhang/VSR-Transformer
Adversarial Variational Embedding for Robust Semi-supervised Learning
Semi-supervised learning is sought for leveraging the unlabelled data when
labelled data is difficult or expensive to acquire. Deep generative models
(e.g., Variational Autoencoder (VAE)) and semisupervised Generative Adversarial
Networks (GANs) have recently shown promising performance in semi-supervised
classification for the excellent discriminative representing ability. However,
the latent code learned by the traditional VAE is not exclusive (repeatable)
for a specific input sample, which prevents it from excellent classification
performance. In particular, the learned latent representation depends on a
non-exclusive component which is stochastically sampled from the prior
distribution. Moreover, the semi-supervised GAN models generate data from
pre-defined distribution (e.g., Gaussian noises) which is independent of the
input data distribution and may obstruct the convergence and is difficult to
control the distribution of the generated data. To address the aforementioned
issues, we propose a novel Adversarial Variational Embedding (AVAE) framework
for robust and effective semi-supervised learning to leverage both the
advantage of GAN as a high quality generative model and VAE as a posterior
distribution learner. The proposed approach first produces an exclusive latent
code by the model which we call VAE++, and meanwhile, provides a meaningful
prior distribution for the generator of GAN. The proposed approach is evaluated
over four different real-world applications and we show that our method
outperforms the state-of-the-art models, which confirms that the combination of
VAE++ and GAN can provide significant improvements in semisupervised
classification.Comment: 9 pages, Accepted by Research Track in KDD 201
Internal Wasserstein Distance for Adversarial Attack and Defense
Deep neural networks (DNNs) are vulnerable to adversarial examples that can
trigger misclassification of DNNs but may be imperceptible to human perception.
Adversarial attack has been an important way to evaluate the robustness of
DNNs. Existing attack methods on the construction of adversarial examples use
such distance as a similarity metric to perturb samples. However, this
kind of metric is incompatible with the underlying real-world image formation
and human visual perception. In this paper, we first propose an internal
Wasserstein distance (IWD) to measure image similarity between a sample and its
adversarial example. We apply IWD to perform adversarial attack and defense.
Specifically, we develop a novel attack method by capturing the distribution of
patches in original samples. In this case, our approach is able to generate
semantically similar but diverse adversarial examples that are more difficult
to defend by existing defense methods. Relying on IWD, we also build a new
defense method that seeks to learn robust models to defend against unseen
adversarial examples. We provide both thorough theoretical and empirical
evidence to support our methods
Inheriting Bayer's Legacy-Joint Remosaicing and Denoising for Quad Bayer Image Sensor
Pixel binning based Quad sensors have emerged as a promising solution to
overcome the hardware limitations of compact cameras in low-light imaging.
However, binning results in lower spatial resolution and non-Bayer CFA
artifacts. To address these challenges, we propose a dual-head joint
remosaicing and denoising network (DJRD), which enables the conversion of noisy
Quad Bayer and standard noise-free Bayer pattern without any resolution loss.
DJRD includes a newly designed Quad Bayer remosaicing (QB-Re) block, integrated
denoising modules based on Swin-transformer and multi-scale wavelet transform.
The QB-Re block constructs the convolution kernel based on the CFA pattern to
achieve a periodic color distribution in the perceptual field, which is used to
extract exact spectral information and reduce color misalignment. The
integrated Swin-Transformer and multi-scale wavelet transform capture non-local
dependencies, frequency and location information to effectively reduce
practical noise. By identifying challenging patches utilizing Moire and zipper
detection metrics, we enable our model to concentrate on difficult patches
during the post-training phase, which enhances the model's performance in hard
cases. Our proposed model outperforms competing models by approximately 3dB,
without additional complexity in hardware or software
Learning Task-Oriented Flows to Mutually Guide Feature Alignment in Synthesized and Real Video Denoising
Video denoising aims at removing noise from videos to recover clean ones.
Some existing works show that optical flow can help the denoising by exploiting
the additional spatial-temporal clues from nearby frames. However, the flow
estimation itself is also sensitive to noise, and can be unusable under large
noise levels. To this end, we propose a new multi-scale refined optical
flow-guided video denoising method, which is more robust to different noise
levels. Our method mainly consists of a denoising-oriented flow refinement
(DFR) module and a flow-guided mutual denoising propagation (FMDP) module.
Unlike previous works that directly use off-the-shelf flow solutions, DFR first
learns robust multi-scale optical flows, and FMDP makes use of the flow
guidance by progressively introducing and refining more flow information from
low resolution to high resolution. Together with real noise degradation
synthesis, the proposed multi-scale flow-guided denoising network achieves
state-of-the-art performance on both synthetic Gaussian denoising and real
video denoising. The codes will be made publicly available